Global perspective of environmental distribution and diversity of Perkinsea (Alveolata) explored by a meta-analysis of eDNA surveys

Perkinsea constitutes a lineage within the Alveolata eukaryotic superphylum, mainly composed of parasitic organisms. Some described species represent significant ecological and economic threats due to their invasive ability and pathogenicity, which can lead to mortality events. However, the genetic diversity of these described species is just the tip of the iceberg. Environmental surveys targeting this lineage are still scarce and mainly limited to the Northern Hemisphere. Here, we aim to conduct an in depth exploration of the Perkinsea group, uncovering the diversity across a variety of environments, including those beyond freshwater and marine ecosystems. We seek to identify and describe putative novel organisms based on their genetic signatures. In this study, we conducted an extensive analysis of a metabarcoding dataset, focusing on the V4 region of the 18S rRNA gene (the EukBank dataset), to investigate the diversity, distribution and environmental preferences of the Perkinsea. Our results reveal a remarkable diversity within the Perkinsea, with 1568 Amplicon Sequence Variants (ASVs) identified across thousands of environmental samples. Surprisingly, we showed a substantial diversity of Perkinsea within soil samples (269 ASVs), challenging the previous assumption that this group is confined to marine and freshwater environments. In addition, we revealed that a notable proportion of Perkinsea ASVs (428 ASVs) could correspond to putative new organisms, encompassing the well-established taxonomic group Perkinsidae. Finally, our study shed light on previously unveiled taxonomic groups, including the Xcellidae, and revealed their environmental distribution. These findings demonstrate that Perkinsea exhibits far greater diversity than previously detected and surprisingly extends beyond marine and freshwater environments. The meta-analysis conducted in this study has unveiled the existence of previously unknown clusters within the Perkinsea lineage, solely identified based on their genetic signatures. Considering the ecological and economic importance of described Perkinsea species, these results suggest that Perkinsea may play a significant, yet previously unrecognized, role across a wide range of environments, spanning from soil environments to the abyssal zone of the open ocean with important implications for ecosystem functioning.


The EukBank database and Perkinsea ASVs phylogenetic classification
The EukBank database is a compilation of eDNA surveys that employed high-throughput sequencing methods (Illumina MiSeq and Roche 454) targeting the hypervariable region V4 of the SSU rDNA sequence 37 .Briefly, raw sequences were obtained from the EMBL/EBI-ENA EukBank umbrella project.When applicable, reads were trimmed with Cutadapt (https:// github.com/ marce lm/ cutad apt/ 38 with specific parameters tailored to extract fragments covered by the primers sets TAReuk454FWD1 and TAReukREV3 from the V4 region or the SSU rRNA gene 39 .Identical sequences were merged with VSEARCH 40 , followed by clustering with Swarm 41 .Subsequently, chimera detection was conducted using the --uchime_denovo function in VSEARCH 42 , and low-quality sequences were filtered out.The final set of ASVs was obtained based on occurrence patterns, utilizing a modified version of the Lulu algorithm 43 , which can be found at https:// github.com/ frede ric-mahe/ mumu.Taxonomic classification of the ASVs was performed using the curated EukRibo database version 1.0 44 , employing the global pairwise alignment approach (--usearch_global from VSEARCH).This database was generated by the UniEuk consortium 37 .
A phylogenetic placement method was applied to ASVs initially affiliated with Perkinsea 45,46 .The sequences were added to the reference alignment using MAFFT with the --addfragments and --keeplength parameters and then placed onto the reference tree using the evolutionary placement approach EPA-ng v0.3.8 46 .The results were subsequently analyzed using Gappa 47 .ASVs not affiliated with Perkinsea were discarded.These included ASVs placed within the outgroup or those with long branches that could not be confidently assigned through manual inspection using NCBI BLASTn with default parameters 48 .
To analyze the composition of Perkinsea diversity, a non-metric multidimensional scaling ordination plot (NMDS) based on the Bray-Curtis dissimilarity was conducted using the vegan package v2.5-7 49 Samples with more than 10,000 reads and at least one Perkinsea ASV were extracted from the EukBank dataset and rarefied using the rrarefy function from vegan package.From each environmental type, 15 samples with the highest Shannon diversity were selected to create a subset of samples representing the highest observed diversity.This approach was adopted to avoid outliers, as many samples contained only one or a few ASVs with very low abundance.Subsequently, the samples were transformed using the Hellinger method 50 with the decostand function, and the NMDS analysis was performed using the Bray-Curtis dissimilarity index with the metaNMDS function.To test the hypothesis of significant difference between the communities of the main categories and between environments, we used a non-parametric test of significant difference ANOSIM (ANalysis Of Similarities) with the function anosim from the vegan package and pairwise.adonisfrom pairwiseAdonis package 51 using the parameters sim.method = bray.
For the community structure analysis based on phylogenetic distance, the Picante and Phyloseq R packages were used 52,53 .The alignment of the ASVs and the outgroup sequences, obtained from the previous Perkinsea ASVs phylogenetic classification, was extracted.The best-ML tree and the rarefy table were used to calculate the sample distance using the weighted UniFrac method.The same samples used in the NMDS analysis were also employed for a principal coordinates analysis (PCoA).Additionally, Faith's Phylogenetic Diversity (PD) 54 and the Nearest Relative Index (NRI) 55 were calculated for each environment.The phylogenetic difference between the environments was tested with pairwise.adonisusing the UniFrac weighted distance matrix.
To explore the global distribution of Perkinsea, the proportion of ASVs related to each taxonomic group observed in the different geographical regions was plotted on a world map.The environmental preferences or occurrences were investigated by calculating the number of ASVs observed in each environment, represented using a chord diagram created with the R package circlize 56 .To assess the overlap of ASVs between the 'Marine' , 'Land water' and 'Soil' categories a Venn diagrams were generated using the R package nVennR 57 .All the analyses were conducted in R (R Core Team, 2020) and scripts for these analyses are available on GitHub (https:// github.com/ sebam etz/ perki nsea_ distr ibuti on).

ASVs related to potential novel Perkinsea organisms
To test the novelty of each ASV, two criteria were selected: the similarity of each ASV to its closest related sequence from a curated Perkinsea dataset of sequences obtained from NCBI and the likelihood weight ratio (LWR) derived from the placement of the ASVs into the reference phylogenetic tree.The LWR serves as an indicator of the confidence in the placement of the ASVs within the phylogenetic tree 47 .A low LWR value suggests a less reliable representation of the ASV's taxonomy in the phylogeny.Since our phylogeny aimed to encompass the known diversity of Perkinsea, ASVs with low similarity to reference sequences from the database and low LWR are likely associated with potential novel organisms whose taxonomy is unknown.We classified an ASV as a novel if its similarity percentage was below the mean similarity with the reference dataset (< 94.8%) and if its LWR was below the mean observed for the placement of all Perkinsea ASVs (< 0.69).The environmental distribution of the novel ASV was investigated using a combination of their phylogenetic placement and the number of observations of the ASV in different samples from each environment subcategory.Using this methodology, we identified environments that potentially harbor novel Perkinsea organisms.

Xcellidae distribution in the open ocean
In our dataset, we identified a notable presence of ASVs related to Xcellidae organisms, which were unexpectedly and widely distributed in the deep open ocean (below 200 depth meters).To further investigate this enigmatic group, we focused on the published dataset from the Malaspina 2010 Circumnavigation Expedition 58 .This dataset is unique because it includes a wide range of marine open ocean samples from surface to bathypelagic zone.For each sample, the V4 region of the SSU rRNA gene derived from DNA and RNA templates of the pico-eukaryotic fraction (between 0.2 and 3 µm) was sequenced.Samples were processed using DADA2 59 and taxonomically annotated as described in Obiol et al. 60 .ASVs related to Perkinsea were extracted and subjected to the same phylogenetic analyses, as described above, to confirm their affiliations within the Perkinsea lineage.
Yet DNA-based sequences contribute to elucidating the Perkinsea community structure, these datasets also include metabolically inactive or dead cells.Hence, the rRNA/rDNA ratio can be used as a proxy for the 'relative' ribosomal activity of the identified Xcelliade clusters 58 .This ratio has been calculated based on the contribution of each ASV classified as Xcellidae to the RNA and DNA-derived samples.

Phylogenetic classification of environmental Perkinsea sequences
Using a phylogenetic approach, we investigated the diversity and community structure of Perkinsea ASV sequences retrieved from the published EukBank dataset 37 .
We first conducted a phylogenetic analysis using a reference tree derived from an alignment of 1790 characters, which included representative alveolate groups and environmental sequences.This analysis successfully retrieved all major clusters described in the literature, including NAG01, Perkinsidae, Xcellidae, Parviluciferaceae, and environmental clusters (referred to as 'Perkinsea environmental cluster 01-04').These clusters exhibited robust node support in both maximum-likelihood and Bayesian inferences (TBE > 0.8 and PP > 0.90, Fig. S1).However, a few sequences, such as M. nigrum (MN721813.1 and MN721814.1)and P. dinoexitiosum (MZ663823.1 and MZ663830.1),displayed low branch support, underscoring the uncertainty in their phylogenetic placement as described in previous analyses 12,13 .

Diversity and distribution of Perkinsea sequences
To investigate the diversity and distribution patterns of the identified Perkinsea, we classified the 4034 samples into three main environmental types: 'Marine' (2567 samples), 'Land water' (410 samples), and 'Soil' (1057 samples).Within these categories, we detected 1001 Perkinsea ASVs in 'Marine' samples, 601 ASVs in 'Land water' , and 269 ASVs in 'Soil' .The sequencing effort for each environment, represented by the percentage of Perkinsea ASVs out of the total number of reads, ranged from approximately 0.1% in both 'Marine' and 'Soil' environments to 2.4% in 'Land water' .These findings are consistent with the results of Jobard et al. 25 .
The rest of Marine samples presented a very dissimilar community and can be separated into different groups corresponding to 'Coastal zone' samples positioned closely to 'Marine sediments' samples (sharing 76% of the 'Coastal zone' ASVs), separated from 'Bathypelagic' , 'Mesopelagic' and ' Abyssal zone' samples.These last three environments share ~ 30% of the ASVs (Table S1).However, despite the relatively high number of shared ASVs, the communities differed significantly, including between 'Epipelagic zone' and 'Marine sediment' , which shared 42% of the ASVs (p value < 0.05, Table S2).Alpha diversity index, including the Shannon index and phylogenetic diversity (PD), indicated that 'Land water' exhibited the highest diversity, with mean values of 1.1 and 1.3, respectively.In contrast, 'Marine' and 'Soil' samples showed lower mean alpha diversity and PD values, averaging around 0.4 (Fig. S4).Additionally, 'Land water' had the highest mean number of observed ASVs per sample (11 ASVs), followed by 'Marine' (2.3 ASVs per sample) and 'Soil' (2 ASVs per sample) (Fig. S4).
Regarding the Net Relatedness Index (NRI) analysis, higher indices were observed in marine water, particularly in the sub-category of 'deep' samples from the Abyssal zone, with a mean NRI of 1.18 and PD value of 0.69 (Fig. S4) This result suggests the presence of phylogenetically clustered ASVs in the 'deep' samples.However, rarefaction curves indicated that only the diversity of land water had been adequately sampled, as the curve reached a plateau, revealing that the Perkinsea diversity retrieved in 'Marine' and 'Soil' samples are still undersampled (Fig. S5).
A significant number of soil ASVs were exclusively retrieved from Temperate forest samples (96 ASVs), followed by Tropical forest samples (25 ASVs), and finally, Temperate Land soils (5 ASVs).However, these results may be influenced by the variable number of samples across each environment, which introduces potential biases.Nonetheless, we observed many ASVs shared between soil environments, even among contrasting ones such as Temperate and Tropical forests (14 ASVs).This indicates that potential Soil-dwelling Perkinsea organisms can adapt to diverse habitats.

Potential novel Perkinsea taxa
By applying specific criteria based on the LWR and the percentage of similarity (mean LWR < 0.69 and the mean % similarity < 94.8%), 428 ASVs potentially related to novel groups were identified (Fig. S7).These 'novel' ASVs branched widely across the reference tree (Fig. 4).Among them, 320 ASVs were detected in 'Marine' environments, 95 in 'Land water' and 50 in 'Soil' environments.The novel ASVs closely related to Parviluciferaceae were detected in marine sediments.ASVs related to Perkinsidae were mostly detected in the epipelagic and bathypelagic zones, except for one ASV detected in Abyssal zone samples (Fig. 4).Novel ASVs related to NAG01 were mainly retrieved from lakes and land water sediments (22 ASVs).Still, a significant number of these novel ASVs were also recovered from the marine epipelagic zone (7 ASVs) and the soil samples (3 ASVs in Temperate and 2 ASVs in Tropical Forest soil samples).'Perkinsea cluster 01' novel ASVs were mostly detected in Land water samples, with a few observations in marine 'epipelagic zone' (3 ASVs) and sediment samples (3 ASVs).'Perkinsea cluster 02' novel ASVs were retrieved in a wide range of samples, including 36 ASVs in soil and land water samples, 10 ASVs in marine samples, four ASVs in the Bathypelagic zone, and two ASVs in sediment samples.We can distinguish two groups among the 'unclassified Perkinsea' novel ASVs.Those closely branching to marine Perkinsea lineages were mostly detected in the Epipelagic zone and marine sediments, following the same patterns as the closely related defined lineages.However, those placed basal to NAG01, 'Perkinsea cluster  www.nature.com/scientificreports/01 to 03' showed different distribution, with some ASVs solely retrieved in marine samples (17 ASVs) and others detected in both land waters and soil samples (18 ASVs).In total, 13 novel ASVs were detected within the three contrasted environments ('Marine' , 'Land water' and 'Soil').

Xcellidae distribution in open oceans
One noteworthy finding was that the ASVs related to Xcellidae were mostly detected in the Mesopelagic zone (between 200 and 1000 depth meters) and exhibited a global distribution across all the different oceans (Fig. S6).However, it is important to consider that the detected eDNA could originate from various sources, including free-living organisms, potentially infected host organisms, metabolically inactive or dead cells, or free DNA 28 .To investigate this hypothesis, we conducted a case study using the published Malaspina Expedition circumnavigation dataset.The expedition took place from 2010 to 2011 and involved sampling the tropical and subtropical regions of the Atlantic, Indian, and Pacific Oceans.This expedition is unique because (i) it covered vertical water column profiles spanning seven depth meters, ranging from the surface to 4000 depth meters and (ii) both rDNA and rRNA were used as templates for environmental sequencing.Using this dataset, we investigated the 'relative' ribosomal activity of Xcellidae in marine samples using DNA and RNA-derived sequencing 58 .We identified 90 ASVs related to Xcellidae.These ASVs contributed to less than 0.1% of the total reads in both DNA and RNA samples.We analyzed the rRNA/rDNA ratio, which can be a proxy for the 'relative' ribosomal activity.Our results highlight that, in most mesopelagic samples, the Xcellidae ASVs exhibited higher relative rRNA contributions than rDNA.This indicates a potential active ribosomal activity, suggesting the presence of putatively living organisms related to Xcellidae in the mesopelagic zone of the marine water column (Fig. S8).

Discussion
In recent years, high-throughput sequencing or metabarcoding has been extensively used in eDNA surveys to investigate the diversity and distribution of protists 61 .The availability of large datasets, such as the EukBank project, which compiles 13,055 unique samples from a wide range of habitats worldwide (including marine, land water and soil), as well as the MetaPR2 dataset comprising over 5036 samples for the V4 region and 1166 samples for the V9 region of the SSU rRNA 62 , presents a unique and valuable opportunity to explore the community structure of Perkinsea in a wide range of ecosystems.

Perkinsea distribution and potential colonization
Our study revealed that Perkinsea is a genetically diverse lineage, with 1568 ASVs identified and distributed worldwide.Marine environments were found to harbor the highest number of Perkinsea ASVs (1001 ASVs), although this could be influenced by the large number of marine samples (2537) compared to other environments (Fig. S5).The three main marine lineages of Perkinsea, namely Parviluciferaceae (318 ASVs), Xcellidae (52 ASVs), and Perkinsidae (39 ASVs), were particularly prominent.Their distribution exhibited a distinct latitudinal gradient.Xcellidae ASVs were mainly found in open ocean samples from middle latitudes, between 20 and 60 degrees, while Parviluciferaceae and 'unclassified Perkinsea' were more prevalent in high altitude samples, between 60 and 90 degrees (Fig. S6).The presence of Parviluciferaceae ASVs at high latitudes, including polar samples, may be attributed to recurrent harmful algal blooms in the Arctic during the last years 63 .The Arctic region is known to host at least five families of noxious micro-algae, including the PSP-producer Alexandrium catenella 63 , which is susceptible to infection by Parvilucifera sinerae (Parviluciferae), previously described and isolated from the Mediterranean Sea 64 .These findings significantly expand the environmental range of Xcellidae and Parviluciferaceae organisms to previously unexplored marine regions such as open ocean and polar regions, respectively.Furthermore, a gradient was observed within the marine samples throughout the water column.Specifically, samples from the Mesopelagic, Bathypelagic and Abyssal zones exhibited more similar community structures based on beta diversity (amount of differentiation between species/ASV communities) than samples from the epipelagic zone.Indeed, samples from the Mesopelagic and Bathypelagic zone presented not significance in the pairwise ANOSIM analysis taking into account the Weighted UniFrac distance (p value = 0.193, Table S2).Interestingly, the diversity indices of the epipelagic zone were more similar to those of the 'Land water' samples, suggesting a potential differentiation of genetic signatures between deep ocean, surface or epipelagic zones.
In contrast to marine environments, Perkinsea contributed significantly to land water samples, accounting for approximately 2.7% of the total reads, with an average of 11 ASVs per sample and a maximum of 69 ASVs in the Sanabria Lake sample.These values were significantly higher than those observed in marine environments, indicating a greater presence and diversity of Perkinsea in land water ecosystems.Lakes and rivers being physically and chemically more heterogeneous than the ocean and lacking continuous geographical distribution, may favor local endemism and, therefore, a higher Perkinsea diversity 5 .Additionally, land water samples exhibited a high phylogenetic diversity (PD index) (Fig. S4), suggesting the coexistence of diverse Perkinsea lineages within the same sample.Previous studies have suggested that freshwater Perkinsea may act as parasites 27,30 .Indeed, Jobard et al. showed that freshwater Perkinsea in Lake Aydat and Bourget (France), during both summer and winter, were associated with Sphaerocystis (Chlorophyceae) and cyanobacteria filaments 25 .This highlights the potential parasitic lifestyle of Perkinsea and their putative role in influencing host-related phytoplankton dynamics in freshwater ecosystems.However, due to the vast genetic diversity retrieved in freshwater ecosystems and the lack of culture-based studies, it remains still enigmatic whether Perkinsea organisms can adopt other lifestyles.Further research is needed to unravel the ecological roles and functional characteristics of Perkinsea in freshwater ecosystems.
Although Perkinsea has primarily been associated with marine and freshwater environments 19,25,27,30 , our study identified 269 ASVs in soil samples (Fig. 2A).These 'Soil' Perkinsea ASVs branched into four different Vol:.( 1234567890 6ASVs) (Fig. 3).One hypothesis for these results is the potential windblown spread of freshwater Perkinsea, similar to what has been suggested for Dinophytes in Neotropical rainforest soil 65 .However, our analyses revealed that 129 ASVs were exclusively detected in soil samples and the communities and phylogenetic diversity is significantly different to land water (Fig. 2A and Table S2).Among the shared ASVs, only 14 ASVs were between samples from Tropical Forests (high precipitations with periods of floods) and Temperate Forest soils (wet and dry soils), highlighting different environmental preferences of soil Perkinsea (Table S1).Furthermore, NMDS analysis revealed a continuum between 'Soil' and 'Land water' samples based on their diversity composition, from 'Land water' to Temperate forests (Fig. 2B).Previous studies have shown that soil samples are rich in parasitic protists, which can influence animal diversity by limiting the population growth of locally abundant species 6 .On the other hand, the low contribution of these ASVs to the total number of reads in soil samples (< 0.1%) suggests that Perkinsea organisms may contribute to the soil seed bank, similar to the findings of Parviluciferaceae in marine coastal sediments 28,[66][67][68] .Among soil subcategories, 86 ASVs were exclusively retrieved in temperate forest samples (Table S1), corresponding to only three locations in the Northern Hemisphere (Switzerland, Norway and Canada).The combination of these results with the low number of ASVs shared between soil environments (35 ASVs) may reflect the lower dispersal capabilities of soil Perkinsea organisms (e.g., by wind and animals) 69,70 compared to aquatic protists.This is especially notable in marine environments where protists are likely to have higher dispersal due to high connectivity [70][71][72] .Therefore, future studies employing isolation and culture-based approaches are needed to confirm or refute the presence of Perkinsea in soil samples and to understand their role in these ecosystems.
During the diversification of Perkinsea, a transition from marine to freshwater environments occurred, as previously discussed by Chambouvet et al. 28 and Bråte et al. 30 .This demonstrates the adaptability of the Perkinsea group to conquer and thrive in contrasting environments 30 .Our study also reveals that a transition to soil environments may be possible during the evolution of Perkinsea.The NMDS analysis based on beta diversity and the PCoA based on phylogenetic distance (Fig. 2B,C) showed distinct community compositions among the primary environments, indicating a potential transition from marine environments to freshwater and soil environments (Fig. S3).However, a more robust phylogenetic analysis with a complete 18S rRNA gene is needed to understand better how many times this succession happened during the evolution of Perkinsea.

Perkinsea cryptic diversity
Our analysis uncovered many new cryptic taxonomic groups within the Perkinsea lineage.These groups can be categorized into two types: (i) sequences that could be assigned to known taxonomic groups (~ 27% of the total ASVs, ASVs classified as 'Novel' ASVs, LWR < 0.69 and the mean % similarity < 94.8%) and (ii) sequences that could not be assigned to any known taxonomic group (~ 35.1% of the total ASVs, ASVs classified as 'unclassified Perkinsea') (Figs. 4 and S7).
One group that exemplifies this cryptic diversity is the Perkinsidae, which currently includes seven described parasitic species known to infect bivalves and gastropods (Mollusks) 8 .Among them, the two infectious agents, P. marinus and P. olseni, are listed as notifiable diseases by the World Organization for Animal Health (O.I.E.) 8,10 .Interestingly, these Perkinsidae ASVs (16 ASVs) have been recovered in many marine ecosystems, spanning from the epipelagic zone to the abyssal zone (Fig. 4).This is noteworthy because most of the described species Perkinsus species have been isolated from coastal marine areas, where they infect commercially important host species 8 .Furthermore, the detection of two of these ASVs in marine sediment samples aligns with the known life cycle of described Perkinsus species, which involves a dormant stage in the sediment 28 .The presence of Perkinsidae ASVs in diverse marine environments suggests that the diversity and distribution of this group may be broader than previously recognized.Similar results were observed for novel ASVs within the Parviluciferaceae, where all the ASVs were exclusively retrieved in marine waters, mostly from the Epipelagic zone and Sediments samples.This consistency is coherent with the known characteristics of described species within this group, parasites of dinoflagellates, a prominent and diverse group in terms of abundance and diversity in all oceans 11 .Environmental ASVs sequences were also recovered from marine sediments, which could be explained by the meroplanktonic life cycle of described Parviluciferaceae.In this life cycle, the activation of sporangia (benthic stage) and the release of infective freeliving zoospores depend on host density 11,68,73 .A temporal series analysis is necessary to describe the life cycle of these potential novel organisms and how the environmental biotic or abiotic factors could activate the resting stages that govern their transition between the water column and sediments 67 .
Lastly, our analysis revealed the presence of six ASVs within the Xcellidae group that showed very low sequence similarity (< 0.6% of similarity) to known species, with only one ASV classified as novel (Fig. S7).Until now, only five species have been described within the Xcellidae family as parasites of 20 fish species belonging to five orders of teleosts 8 .This result potentially indicates the presence of sequences related to species that are currently not represented in public databases.
The detection of novel ASVs with low sequence similarity to known species within all known groups, including Perkinsidae, Parviluciferaceae, and Xcellidae, as well as those without any known taxonomic group, suggests that the diversity within Perkinsea may be much wider than previously thought.This is important because it raises intriguing questions regarding these putative organisms' nature (e.g., symbiotic or free-living).
The Xcellidae group has remained enigmatic as it has not been previously reported in environmental surveys, except for only one ASV from the bathypelagic zone of the Gulf of California 15 .However, our analysis has provided, for the first time, the distribution of genetic signatures related to Xcellidae in open ocean samples, particularly in the mesopelagic zone (between 200 and 1000 depth zone), where these sequences dominate the Perkinsea dataset.The mesopelagic zone is a particular zone of the water column as it harbors the largest biomass of fish in open ocean 74 .Therefore, the presence of Xcellidae genetic signatures in this zone raises the possibility of these sequences being associated with infectious agents of fish.Indeed, Freeman et al. 15 proposed that Xcellidae infections may occur through contact between fish and the benthos or fish-to-fish transmission.However, this study analyzed ASVs derived from DNA templates that could correspond to eDNA, free-living life stages released during their life cycle, and living or dead hosts.To overcome this issue, we analyzed the Malaspina dataset, which includes sequences derived from DNA and RNA templates.We showed that most of the Xcellidae ASVs were also recovered from the RNA template in the water column, suggesting the presence of potentially 'ribosomally' active life stages (Fig. S8).These findings provide valuable insights into the potential activity and ecological significance of putative organisms related to Xcellidae in the mesopelagic zone of the open ocean.

Conclusions
Our study provides evidence that the Perkinsea lineage might be diverse and globally distributed.Marine environments harbor the highest number of Perkinsea ASVs, followed by land water samples.Surprisingly, we also identified a significant number of ASVs in soil samples, highlighting the need for further investigation to uncover the role of Perkinsea, especially in soil ecosystems.Numerous unclassified and novel ASVs within the Perkinsea lineage indicate potential cryptic diversity and underscore the importance of exploring and characterizing this lineage in greater detail.
Our analysis also reveals the dominance of Xcellidae within Perkinsea communities in the mesopelagic zone of the open Ocean, suggesting a potential association with fish hosts and/or the presence of a free-living infective life stage.These findings provide valuable insights into the potential ecological roles of Xcellidae in marine ecosystems, particularly in relation to fish hosts.
Using these results as the foundation of future work, the next challenge will be to confirm or refute if these genetic signatures are linked to specific organisms and/or represent novel taxa.Further research, including isolation and culturing efforts, as well as morphological and genomic characterization, will be essential to identify and understand the nature (e.g., symbiotic or free-living) and their ecological roles.In fine, further studies on Perkinsea diversity and their ecological roles are crucial for effective management and mitigation of the potential invasive and pathogenic impacts as it was described for specific species within this lineage.By gaining a comprehensive understanding of their diversity, distribution, and ecological interactions, we will be able to develop conservation strategies and mitigate potential environmental and economic consequences.

Figure 1 .
Figure 1.Phylogenetic placement of ASVs into the Perkinsea reference tree.The groups of ASVs were collapsed and highlighted in pink.The red numbers correspond to the number of ASVs in each collapsed cluster.

Figure 2 .
Figure 2. (A) Venn diagram of ASVs shared across environments.(B) Non-metric multi-dimensional scaling ordination plot (NMDS) based on Bray-Curtis distances, with 87 samples from 'Marine' environments, 53 from 'Land water' and 40 from 'Soil' .(C) PCoA based on the phylogenetic distance of the samples.

Figure 3 .
Figure 3. Distribution of Perkinsea ASVs into the different sub-environment categories.Each plot represents a specific main environment, such as Marine (A), Land water (B), and Soil (C).The description of each sub-category per environment is next to each plot.The width of the connectors is the number of ASVs corresponding to the specific taxonomic group.The taxonomic group description is in the lower boxed panel and applies to all plots.